Steep terrain and high frequency of tropical rainstorms make landslide occurrence on natural terrain a common phenomenon in Hong Kong. The present article reports on the use of a geographical information system (GIS) database, compiled primarily from existing digital maps and aerial photographs, to describe the physical characteristics of landslides and the statistical correlations between landslide frequency and terrain variables on Lantau Island in Hong Kong. This database is then used to obtain a logistic multiple regression model to predict landslide susceptibility. Slope gradient, lithology, elevation, slope aspect, and land use cover are indicated as statistically significant in predicting landslide susceptibility, whereas slope morphology and proximity to drainage line are not important and are thus excluded from the model. This model is then imported back into the GIS to produce a map of predicted landslide susceptibility. This study demonstrates that landslide susceptibility can be effectively modeled by using GIS technology and logistic multiple regression analysis.
Introduction
Landslides on mountainous terrain often occur during or after heavy rainfall, resulting in loss of life and damage to the natural or the built environment (or both). Areas prone to landslides should therefore be identified in advance to reduce such damage. Landslide susceptibility mapping can provide much of the basic information essential for hazard mitigation through proper project planning and implementation.
Landslide susceptibility can be defined as the probability that a landslide will occur in a specific area in the future and can be measured from the correlation between determining factors and the spatial distribution of landslides (Brabb 1984). A variety of techniques, such as qualitative, statistical, and deterministic approaches, have been proposed to estimate landslide susceptibility. In qualitative approaches several maps representing the spatial distribution of factors that may influence the occurrence of landslides are combined to produce a susceptibility map, using subjective decision rules, based on the experience of the geoscientists involved (Anbalagan 1992; Pachauri and Pant 1992; Sarkar et al 1995). In statistical approaches statistical analysis is used to determine the relation between landslide susceptibility and a number of factors that are considered to have an influence on landslide occurrence. This relation is then applied to map landslide susceptibility (Yin and Yan 1988; Carrara et al 1991; Dhakal et al 1999). Both qualitative and statistical approaches are based on the general principle that “the past and the present are the keys to the future,” that is, future slope failures will more likely occur under conditions that led to past and present landslides (Carrara et al 1991). Deterministic approaches are based on slope stability analyses and are only applicable when the ground conditions are fairly uniform across the study area, and the landslide types are known and relatively easy to analyze.
Geographical information systems (GIS) are technologically capable of facilitating various functions of geospatial data handling, analysis, and management and has been employed to model and predict landslide susceptibility. The overlay operation commonly associated with GIS is useful in both heuristic and statistical approaches (Gupta and Joshi 1989; Carrara et al 1991; Wang and Unwin 1992; Van Westen et al 1997). The infinite slope stability model has also been incorporated into the GIS to calculate the spatial distribution of the factor of safety within a given region, based on the assumption that landslides generally occur along shallow failure surfaces (Terlien et al 1995; Wu and Sidle 1995; Van Westen et al 1997). The advantage of incorporating the deterministic models into the GIS is that they permit quantitative factors of safety to be quickly calculated for a given area. The main problem involved is the high degree of simplification that is usually necessary for the use of such models.
The present article reports on the use of a GIS database, compiled primarily from existing digital maps and aerial photographs, to describe the physical characteristics of landslides and the statistical correlations between landslide frequency and terrain variables on Lantau Island in Hong Kong. This database is then exported to an external statistical package to obtain a logistic multiple regression model for predicting landslide susceptibility. The model is then imported back into the GIS to produce a map of predicted landslide susceptibility.
Description of the study area
Lantau Island is located in the southwestern part of the territory of Hong Kong and is the largest island within the territory (Figure 1). Primarily because of its steep terrain, the island is virtually undeveloped and uninhabited, with the exception of small coastal patches of flat land. Land with slope gradients greater than 25° accounts for 44% of the total land area. Elevation ranges from sea level to over 900 m above sea level and changes abruptly.
The bedrock geology of the study area is dominated by Mesozoic volcanic rocks and the younger intrusive igneous rocks. The volcanic rocks, which comprise tuffs and lavas with intercalated sedimentary rocks, crop out in the west and north of the study area. Intrusive rocks consist mainly of granites and dykes of various compositions. The Paleozoic sedimentary strata comprising metamorphosed siltstone, sandstone, and carbonaceous siltstone occur as a small outcrop in the northwest coastal portions of the study area. Superficial deposits of the Quaternary age form large, flat-lying areas. In hilly terrain, colluvium, including debris flows and other slope debris deposits, mostly of the late Pleistocene to the Holocene, commonly mantles side slopes and valleys as a result of numerous individual episodes of mass wasting and erosion during the Quaternary period. Colluvium occurs as relatively thin ribbon-like deposits filling drainage courses. However, there are considerably thicker deposits of greater extent on some hillslopes in the study area. The colluvium derived from volcanics typically consists of subangular cobbles and boulders, feldsparphyric rhyolite with some tuff, in a matrix of mottled, reddish brown, and yellowish brown gravelly, sandy, slightly clayey silt. Small alluvial deposits occur in hilly areas, but alluvium is generally restricted to fans developed downslope of the colluvial deposits (Geotechnical Control Office 1988a,b). A regolith, or mantle of weathered rock, occurs over most of the study area. Intrusive rocks and the Paleozoic sedimentary rocks are mostly deeply weathered and eroded and form the lower ground. The acidic volcanic rocks are more resistant to deep weathering and erosion. The study area is structurally affected by 2 sets of faults trending NE-NNE and NNW-NW, respectively.
The climate is subtropical and monsoonal, with mild, dry winters and hot, humid summers. Rainfall is heavy and occasionally intense during the rainstorms and typhoons. The hillslopes are drained by numerous small streams, most of which only flow during or after heavy or prolonged rainfall. The hillsides are often deeply incised as a result of erosion caused by ephemeral streams. In general, piezometric records from previous site investigations indicate that the regional groundwater table lies either just within the slightly to moderately weathered bedrock or within the overlying saprolite (Franks 1998). The relatively high permeability of the colluvium, when compared with the underlying saprolite or weathered bedrock, allows for the development of transient perched groundwater tables at the interface during or following periods of intense rainfall.
Source and preprocessing of data
The study area was examined using ArcView GIS software. The data needed for this study include topography, land use classification, a terrain morphologic map, superficial and bedrock geology, and the locations of landslides. All locational, geological, and geomorphologic features provided by these different thematic maps were imported into the ArcView GIS, or digitized using the GIS PC Arc/Info software, and then transferred to ArcView for subsequent analyses.
Contour lines and drainage lines were obtained from 1:20,000-scale topographic maps. Elevation data were obtained from the digital elevation model derived from the 1:20,000-scale digital contour lines of the area. Two data layers were derived from these elevation data, namely, slope aspect and slope gradient. Proximity to drainage line was calculated using GIS functions.
Superficial and bedrock geological data were obtained from 1:20,000-scale solid and superficial geological maps developed by the Hong Kong Geological Survey of the Geotechnical Engineering Office (GEO), formerly known as the Geotechnical Control Office. The maps covering the study area describe the geological groups, each comprising geological units of broadly similar lithology. For ease of analysis, the groups were further reclassified into 9 categories: (1) alluvial, terrace, and beach deposits; (2) debris flow deposits and talus; (3) sedimentary rock; (4) metasedimentary rock; (5) intrusive rock; (6) minor intrusive rock; (7) ash tuff, tuffite, and tuff breccia; (8) trachydacite, dacite, and rhyolite lava; and (9) volcaniclastic sedimentary rock, based on stratigraphy and genesis.
The 1:20,000-scale digital terrain classification maps covering the study area, developed by the GEO, were available to the authors. This data set contains terrain classification information that includes erosion and stability as well as terrain components and morphology, which was derived primarily using the aerial photography interpretation technique (Geotechnical Control Office 1988a,b). Based on the terrain classification information, terrain morphology, which describes the physical appearance of the slope and the general shape of the slope profile (straight, concave, or convex), is extracted and then reclassified into 10 classes for simplicity. These are: hillcrest or ridge, straight sideslope, concave sideslope, convex sideslope, straight footslope, concave footslope, convex footslope, drainage plain, rock outcrop, and others.
The landslide database used was derived from the GEO work in which landslide locations and trails were digitized from 23 temporal sets of 1:20,000- to 1:40,000-scale stereoscopic aerial photographs dating from 1945 to 1994 (King 1999). The aerial photographs used thus cover a period of 50 years. On aerial photographs, landslides were observed as having a distinctive light tone generally bare of vegetation (King 1999). Because landslides as old as about 10 years were visible before revegetation masked most of the scars, the aerial photographs record a period of landslide activity of about 60 years. The location of each identified landslide crown was recorded on the 1:5000-scale base map. The width of the landslide scars was classified as either greater or less than 20 m wide, and the ground slope angle across the landslide head, calculated from the distance between the steepest 2 adjacent contours on the 1:5000-scale map, was noted. Figure 2 shows the locations of the landslides.
A 1:50,000-scale coverage of land use types for the whole territory of Hong Kong, based on the interpretation of SPOT images by Chi (personal communication) in 1998, is used for the analysis. Although 35 land use types were mapped, these were reduced to 6 for the purposes of this study. They are (1) developed land, such as cropland, roads, structures, reservoirs, and reclamation; (2) forestland; (3) shrub-forestland; (4) dense shrub-grassland with a shrub coverage of less than 40%; (5) moderate grassland with >50% coverage; and (6) sparse grassland on rock outcrop-dominated areas. It should be noted that figures for land use cover should be considered only as estimates because of increased current and future development of coastal flat-lying lands and possible changes in land use types over the past few decades.
The aforementioned vector data sets were then rasterized into grid cells for subsequent analyses. The grid cells were 20 m × 20 m. This was based on the scale of the topographic map used and the size of the landslides, most of which are less than 20 m in width.
Physical characteristics
Description of landslides
Landslide classification systems are usually based on a combination of material and movement mechanism. Using the system proposed by Cruden and Varnes (1996), most of the landslides in the study area can be characterized as probably debris slides, debris flows, complex debris slide-flows, or composite debris slide-flow falls, all of which may be either open slope or channelized (King 1999). About 79.2% of the 2135 landslides recorded were less than 20 m in source width.
Field observations revealed that the majority of landslides had the following characteristics:
The volume of failure generally ranged from 10s of cubic meters to over 1000 m3.
The failure depth generally varied from about 0.5 to 3 m.
The failures generally occurred along the colluvium-bedrock contact, and the predominant failure mode was of the translational type, involving a slipping of a thin layer of colluvium with a planar failure surface.
Most landslides started as slides and quickly converted to flows because of the water involved and the steep terrain below the debris sources (Dai et al 1999).
Field checking indicates that the majority of the landslides had the following common features: a source area, defined by a surface of rupture that comprises the main scarp and the scarp floor; a landslide trail down slope of the source where debris transport predominates, though erosion and deposition may also occur; and a deposition fan where the majority of the landslide debris is deposited. It should be noted that a deposition fan may not be well developed for many failures on open slopes because the landslide debris is completely deposited on the trail path.
Frequency of landslides and terrain variables
Landslides that occurred in the study area were correlated with all the factors considered to influence their occurrence. These factors include lithology and structure, slope gradient and slope morphology, slope aspect, elevation, proximity to drainage line, and land use cover. The digital map of landslide distribution was overlaid on the physical parameters or data layers mentioned previously, using the GIS, and the frequency of landslides, which is the number of landslides per square kilometer, was calculated for each category on the factor maps. Table 1 shows the frequency values of the landslides for each subclass variable used to represent terrain parameters that may affect landslide susceptibility. This was achieved through use of the GIS and its ability to quickly determine the area of individual coverage components throughout the study area for each terrain parameter. In Table 1 the predominant terrain variable classes for landslide occurrence, such as slope angle class and elevation zone, can be readily observed.
Landslide susceptibility mapping
Logistic multiple regression
Logistic multiple regression is a multivariate technique that considers several physical parameters that may affect probability. It accepts both binary and scalar values as the independent variables, which allows for the use of variables that are not continuous or qualitatively derived. Logistic multiple regression modeling has several advantages over other multivariate statistical techniques, including multiple regression analysis and discriminant analysis. The dependent variable can have only 2 values—an event occurring or not occurring, and predicted values can be interpreted as probability because they are constrained to fall in the interval between 0 and 1. An appealing S-shaped description of the combined effect of several independent variables on the dependent variable is also an important factor in the popularity of logistic multiple regression modeling.
Logistic multiple regression identifies variables that are significant in predicting the probability of occurrence. In the present study the dependent variable is a binary variable representing the presence or absence of landslides. The technique of logistic multiple regression yields coefficients for each variable based on data derived from samples taken across a study area. These coefficients serve as weights in an algorithm that can be used in the GIS database to produce a map depicting the probability of landslide occurrence.
Quantitatively, the relationship between the occurrence and its dependency on several variables can be expressed as
where Pr(event) is the probability of an event occurring. In the present situation the Pr(event) is the estimated probability of landslide occurrence. As Z varies from −∞ to +∞, the probability varies from 0 to 1 on an S-shaped curve. However, in a strict sense, it is not a probability because the dynamic variables triggering landslides, such as rainfall, are not accounted for. It may be more appropriate to term it hereafter “landslide susceptibility” based on the quasi-static physical parameters. Z is the linear combination where Bi (i = 0,1,…,n) must be estimated from the sample data, n is the number of independent variables (ie, landslide-related physical parameters), and Xi (i = 1,2,…n) is the independent variable. In logistic multiple regression a coding scheme should be selected for the categorical variables by creating a new set of variables that correspond in some way to the original categories. The number of new variables required to represent a categorical variable is one less than the number of categories. The parameters of the logistic multiple regression model are estimated using the maximum likelihood method.Logistic multiple regression modeling is intended to describe the likelihood of landslide occurrence on a regional scale and is very suitable for the assessment of landslide susceptibility because the observed data consist of locations (points) or cells with a value of 0 (absence of landslide) or 1 (presence of landslide). This method allows a spatial distribution of probabilities or susceptibility values to be calculated within the GIS environment.
Variable selection and sampling
All terrain variables considered relevant to the occurrence of landslides, as shown in Table 1, were selected as the initial categorical variables in the present study. For each factor the same categorization scheme used previously to study the relation of landslide frequency to factor classes was adopted for consistency.
For the purpose of the statistical analysis, sample data representing both the absence and the presence of landslides must be provided to fit the logistic multiple regression model. The way in which these data are obtained will affect both the nature of the regression relation and the nature and accuracy of the resulting estimates. In this study the data set of landslide inventory is an indispensable data source representative of samples of landslide presence. All locations of the 2135 landslides studied were thus used to extract the physical parameters (independent variables) automatically from the existing data layers. To eliminate bias in the sampling process, an equal number of points were chosen from the area not yet affected by landslides as samples representing the absence of a landslide. These locations were obtained using a spatially uniform sampling scheme but excluding a 40-m buffer zone for all landslides. Each sample point has its respective binary value for the presence or absence of landslide, as well as information on independent variables. The training data were then used to input to the logistic multiple regression algorithm within the SPSS, a desktop statistical software package, to obtain the coefficients for the logistic multiple regression model.
Modeling results
A logistic multiple regression model was initially constructed based on the physical parameters defined previously. Then, at each step the variables were evaluated for removal, one by one, if they did not contribute sufficiently to the regression equation. In the present analysis, the likelihood-ratio test was always used to determine whether variables should be added to the model. This involved estimating the model with each variable eliminated in turn and looking at the change in the logarithm of likelihood when each variable was deleted. If the observed significance level was greater than the probability of remaining in the model (0.1 in this study), the variable was removed from the model, and the model statistics were recalculated to determine whether any other variables were eligible for removal. The variables included in the model were slope gradient, lithology, elevation, slope aspect, and land use cover. Both proximity to drainage line and slope shape were not significant and were thus eliminated from the stepwise procedure.
The modeling result indicated that the model produced a concordance rate of 81.7% and that 85.2% of the actual landslides were correctly classified with the use of 0.5 as a classification cut-off value. To map future landslide susceptibility in the study area, the logistic multiple regression model obtained was then transferred into the ArcView GIS and applied to the independent variables representing the existing conditions for each cell within the study area. For general purposes, the range of landslide susceptibility was classified into 4 categories: (1) very low susceptibility (0–0.2), (2) low susceptibility (0.2–0.35), (3) moderate susceptibility (0.35–0.5), and (4) high susceptibility (>0.5). The ranges of the individual classes were derived based on the histogram of the estimated landslide susceptibility.
The final product of the analysis is shown in Figure 3. Zones classified as having “very low” susceptibility are distributed in clusters on the coastal lowland and footslope areas and on the top of high mountains characterized by relatively gentle gradients. All these sites are generally stable and not prone to landslides. Zones of “low” susceptibility are relatively dispersed; the combination of factors is not very likely to adversely affect slope stability, and chances that landslides will occur are small. In zones with “moderate” susceptibility the combination of factors may adversely influence slope stability. When disturbed, the slopes are prone to landslides, and the costs of investigation and preventive measures are likely to be high. The “high”-susceptibility class exhibits a spatial distribution strongly characterized by clustered patterns. This category has a high potential for landslide occurrence and is characterized by relatively high elevations and steep terrain. About 85% of the locations of the identified landslides actually fall within this class, and existing ground conditions are very likely to create further serious landslide problems.
The landslide susceptibility map obtained from this study (Figure 3) provides information that can be used to identify different levels of risk owing to landslides. This in turn can facilitate the implementation of appropriate loss-reduction strategies (prevention, mitigation, avoidance, or all of them) for both existing and future development. The most practical and cost-effective loss reduction method is to avoid areas with relatively high landslide susceptibility. The map can also be used to identify areas where detailed geologic–geotechnical investigations are desirable before development.
However, it should be understood that the landslide susceptibility map is intended primarily for the assessment of landslide susceptibility for planning purposes on a regional scale and that it cannot be used to determine the stability of specific sites. Natural as well as human-induced changes can affect landslide susceptibility in any area, and the absence of past or present landslides does not necessarily mean that landslides will not occur in the future.
Conclusions
With Lantau Island of Hong Kong as a study area, the pertinent landslide characteristics were described, and the relations of landslide frequency with the terrain parameters that contribute to landslide occurrence were presented. GIS tools made possible the production of innovative landslide susceptibility maps. In particular, they facilitated the application of the logistic multiple regression technique. Logistic multiple regression applied to training samples collected from existing data layers considered to be relevant to landslide occurrence made it possible to predict landslide susceptibility at a rate of about 85% concordance. The predicted susceptibilities generated from the model within the GIS environment were in turn used to produce a map of relative landslide susceptibility that can be used to facilitate the implementation of appropriate loss reduction strategies for both existing and future development.